AITopics | k-means clustering algorithm

Collaborating Authors

k-means clustering algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding K-Means Clustering Algorithm - Analytics Vidhya

#artificialintelligenceMar-24-2022, 02:00:20 GMT

With the rising use of the Internet in today's society, the quantity of data created is incomprehensibly huge. Even though the nature of individual data is straightforward, the sheer amount of data to be analyzed makes processing difficult for even computers. To manage such procedures, we need large data analysis tools. Data mining methods and techniques, in conjunction with machine learning, enable us to analyze large amounts of data in an intelligible manner. It is capable of classifying unlabeled data into a predetermined number of clusters based on similarities (k).

centroid, dataset, k-means, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

K-Means Clustering Algorithm

#artificialintelligenceOct-30-2021, 16:35:50 GMT

To process the learning data, the K-means algorithm in data mining starts with the first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids. You'll define a target number k, which refers to the number of centroids you need in the dataset. A centroid is the imaginary or real location representing the center of the cluster. Every data point is allocated to each of the clusters by reducing the in-cluster sum of squares. The K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster while keeping the centroids as small as possible.

centroid, data mining start, k-means clustering algorithm, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.85)

Add feedback

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Mohammadi, Seyed Omid, Kalhor, Ahmad, Bodaghi, Hossein

arXiv.org Artificial IntelligenceOct-9-2021

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

algorithm, centroid, dataset, (15 more...)

arXiv.org Artificial Intelligence

2110.0466

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Use-Cases of K-Means Clustering

#artificialintelligenceAug-12-2021, 12:51:02 GMT

In this blog, first of all we will see what is K-Means Clustering Algorithm and then discuss about some of it's Industry use-cases. Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision. Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters.

algorithm, dataset, k-means clustering, (11 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Fraud (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

K-Means Clustering Algorithm

#artificialintelligenceAug-30-2020, 19:56:10 GMT

K-Means Clustering Algorithm K-Means Clustering With Python will help you to comprehensively learn all the concepts of the k-means algorithm in machine learning. K-means Clustering is one of the most common data analysis technique used to get an intuition about the structure of the data. It has various applications such as, Identifying Fake news, Filtering spam mails & Customer Segmentation. This "K-means clustering" tutorial will help you to comprehensively learn all the concepts of the k-means algorithm in machine learning. K-means Clustering is one of the most common data analysis technique used to get an intuition about the structure of the data.

artificial intelligence, k-means clustering algorithm, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Grouping the executables to detect malware with high accuracy

Sahay, Sanjay K., Sharma, Ashu

arXiv.org Artificial IntelligenceJun-22-2016

The metamorphic malware variants with the same malicious behavior (family), can obfuscate themselves to look different from each other. This variation in structure leads to a huge signature database for traditional signature matching techniques to detect them. In order to effective and efficient detection of malware in large amounts of executables, we need to partition these files into groups which can identify their respective families. In addition, the grouping criteria should be chosen such a way that, it can also be applied to unknown files encounter on computers for classification. This paper discusses the study of malware and benign executables in groups to detect unknown malware with high accuracy. We studied sizes of malware generated by three popular second generation malware (metamorphic malware) creator kits viz. G2, PS-MPC and NGVCK, and observed that the size variation in any two generated malware from same kit is not much. Hence, we grouped the executables on the basis of malware sizes by using Optimal k-Means Clustering algorithm and used these obtained groups to select promising features for training (Random forest, J48, LMT, FT and NBT) classifiers to detect variants of malware or unknown malware. We find that detection of malware on the basis of their respected file sizes gives accuracy up to 99.11% from the classifiers.

artificial intelligence, machine learning, malware, (16 more...)

arXiv.org Artificial Intelligence

1606.06908

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Histogram-Based Method for Effective Initialization of the K-Means Clustering Algorithm

Gingles, Caroline (Louisiana State University in Shreveport) | Celebi, M. Emre (Louisiana State University in Shreveport)

AAAI ConferencesMay-7-2014

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, this algorithm is highly sensitive to the initial selection of the cluster centers. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have superlinear complexity in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the order in which the data points are processed. These methods are generally unreliable in that the quality of their results is unpredictable. In this paper, we propose a linear, deterministic, and order-invariant initialization method based on multidimensional histograms. Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrate the superiority of our method over the well-known maximin method.

artificial intelligence, k-means clustering algorithm, machine learning, (2 more...)

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm

Fausett, Adam (Louisiana State University in Shreveport) | Celebi, M. Emre (Louisiana State University in Shreveport)

AAAI ConferencesMay-19-2013

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, the nearest neighbor search step of this algorithm can be computationally expensive, as the distance between each input vector and all cluster centers need to be calculated. To accelerate this step, a computationally inexpensive distance estimation method can be tried first, resulting in the rejection of candidate centers that cannot possibly be the nearest center to the input vector under consideration. This way, the computational requirements of the search can be reduced as most of the full distance computations become unnecessary. In this paper, a fast nearest neighbor search method that rejects impossible centers to accelerate the k-means clustering algorithm is presented. Our method uses geometrical relations among the input vectors and the cluster centers to reject many unlikely centers that are not typically rejected by similar approaches. Experimental results show that the method can reduce the number of distance computations significantly without degrading the clustering accuracy.

accelerated nearest neighbor search method, k-means clustering algorithm

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback